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A compiler comprises a loop detecting unit for extracting information of loops, and a high-speed loop 
applying unit generating a first loop exclusive instruction, placing the instruction immediately before the entry 
of a loop, generating second loop exclusive instructions, and placing the instruction at each place to branch 
to the entry of the loop. A processor comprises: a pipeline comprising: an instruction fetching unit, an 
instruction decoding unit, and an executing unit; a branch target storage unit; a branch target registering unit 
for, after the instruction decoding unit has decoded a first loop exclusive instruction, registering branch target 
information of an instruction succeeding to the first loop exclusive instruction in the branch target registering 
unit; and a branch executing unit for, after the decoding unit has decoded a second loop exclusive 
instruction, judging whether to execute a loop, if judges to execute, reading the branch target information 
registered in the branch target storage unit, an d con trolling the pipeline so that the program executes the 
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(57) A compiler comprises a loop detecting unit for 
extracting information of loops, and a high-speed loop 
applying unit generating a first loop exclusive instruc- 
tion, placing the instruction immediately before the entry 
of a loop, generating second loop exclusive instructions, 
and placing the instruction at each place to branch to 
the entry of the loop. A processor comprises: a pipeline 
comprising: an Instruction fetching unit, an instruction 
decoding unit, and an executing unit; a branch target 
storage unit; a branch target registering unit for, after 
the instruction decoding unit has decoded a first loop 
exclusive instruction, registering branch target Informa- 
tion of an instruction succeeding to the first loop exclu- 
sive instruction in the branch target registering unit; and 
a branch executing unit for, after the decoding unit has 
decoded a second loop exclusive instruction, judging 
whether to execute a loop, if judges to execute, reading 
the branch target information registered in the branch 
target storage unit, and controlling the pipeline so that 
the program executes the loop using the read branch 
target information. 
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Description 

BACKGROUND OF THE INVENTION 

(1 ) Field of the Invention 

This invention relates to a compiler for compiling 
source programs into machine-language instruction se- 
quences and to a processor for executing the machine- 
language instruction sequences using a pipeline 
processing, and especially relates to a compiler and a 
processor for executing loops at high speed. 

(2) Description of the Prior Art 

A pipeline processing is known as one of fundamen- 
tal techniques for achieving high-speed processing by 
a Central Processing Unit (CPU: hereinafter processor). 
In the pipeline processing, a process dealing with in- 
structions is divided Into smaller units, or pipeline stag- 
es, then each pipeline stage is processed at the same 
time to improve the processing speed. IHowever, the 
technique is not effective in processing loops because 
stalls occur when branch instructions are processed. 
Due to the stalls, the operational performance of the 
pipeline processing does not reach the ideal perform- 
ance. This phenomenon Is called a branch hazard. 

Now, the branch hazard is explained with reference 
to Fig.1 and Fig.2. 

Fig. 1 shows a source program in which an addition 
and a multiplication between two integers are repeated 
three times each. 

Fig.2 shows a machine-language Instruction se- 
quence obtained by compiling the source program of 
Fig. 1 . The operands and Instructions used in the instruc- 
tion sequence are as follows: 



a, b, c, d, i: 


Registers assigned to integer vari- 




ables. 


mov 0,1: 


Transfer 0 into i. 


L: 


Label. 


add a,b,c : 


Transfer sum of a and b into c. 


mul a.b.d : 


Transfer result of multiplication a * 




b into d. 


add i.1.i: 


Add 1 to i. 


cmp i,3 : 


Compare i with 3. 


bcc L: 


Branch to L if comparison result of 




"cmp 1,3" is 1 < 3. 



When the instaiction sequence of Fig.2 is executed, 
instructions from "add a,b,c" to "bcc L" loop three times. 

Fig.3 shows a flow of a pipeline formed when the 
instruction sequence of Fig.2 is executed, the pipeline 
showing operations at each clock cycle. The pipeline 
comprises three stages: IF for fetching instructions; 
DEC for decoding instructions; and EX for executing in- 
structions and generating effective addresses. An in- 
struction fetched at IF stage is executed at EX stage two 



clock cycles later. After executing branch instruction 
'bcc L" at clock cycle 8, the processor recognizes in- 
struction 'add a,b,c" as the instructk)n to be executed 
next, and fetches instnjction "add a,b,c" at clock cycle 
s 9. That is, the processor invalklates instructions over 
two clock cycles after executing branch instruction "bcc 
L' at clock cycle 8, and executes instruction "add a,b,c" 
next. That means a pipeline stall over two clock cycles 
occurs each time the processor loops once. 

There are three known methods for avoiding the 
branch hazard: (a) Delayed branch (see for example, 
David A. Patterson and John L. Hennesy, Computer Ar- 
chitecture - A Quantitative Approach, Morgan 
Kaufmann Publishers, 1990.); (b) Loop repeat; and (c) 
Branch target buffer. 

(a) Delayed branch 

In this method, loops are scheduled when a com- 
piler compiles a source program, and a valid instruction 
is moved to a branch -delay slot, namely an instruction 
position after a branch instruction. With this arrange- 
ment, no instruction Is invalidated when a branch in- 
struction is executed. The instructions to be moved to 
the branch-delay slot are a pre-branch instruction, a 
branch target instruction, and a post-branch instruction. 

(b) Loop repeat 

In this method, entry and exit addresses of a loop, 

number of loops and the like are stored in an exclusive 
register in the processor before the loop is executed. 
With such an arrangement, an address where the pro- 
gram returns to in a loop does not need to be computed, 
and the branch hazard problem is solved. 

(c) Branch target buffer 

In this method, when the processor branches to a 
new address for the first time, a branch target address 
and a branch target instruction sequence are stored in 
an exclusive buffer in the processor called a branch tar- 
get buffer. Then, when the program branches to the ad- 
dress stored in the branch target buffer, the stored 
branch target instruction sequence is fetched to be ex- 
ecuted. With such an arrangement, when an instruction 
sequence starting from the same address is executed 
repeatedly, accessing to the branch target buffer will do 
in the second execution and after Therefore, the in- 
structions do not need to be fetched from an external 
memory, solving the branch hazard problem. 

However, these prior-art techniques have the fol- 
lowing problems: 

(a) Delayed branch 

In this method, a branch instruction should not de- 
pend on Its pre-branch instruction when the pre-branch 
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instruction Is moved to a branch-delay slot. Sonne pre- 
branch Instructions may not meet this condition. Even 
when there is a pre-branch Instruction meeting the 
above condition and the static scheduling has been 
completed, the program may wait for a while before an 
instruction in a branch-delay slot or an instruction to be 
executed after a branch is fetched at the execution. This 
is because, for example, an external memory in which 
these instructions are stored nr^ay be a low-speed de- 
vice, or an extemal bus which is used to fetch such in- 
structions may be occupied by another processing. 

The same problem occurs when a branch target in- 
struction or a post-branch instruction is moved to a 
branch-delay slot, and furthermore, the performance im- 
provement depends on whether branches succeed or 
not. 

(b) Loop repeat 

To use this method, the number of repetitions in a 
loop should be known before the loop Is executed. That 
is, this method cannot be used If the number of repeti- 
tions Is determined through the execution of the loop. 
Therefore, the use of this method is limited to operations 
such as a repetitive numeric operation with a fixed for- 
mat. 

(c) Branch target buffer 

In this method, it must be checked whether a branch 

target address is stored In a branch target buffer each 
time the program branches. This Increases the number 
of processes performed in a clock cycle, making it diffi- 
cult for the program to speed up clock cycles. 

SUMMARY OF THE INVENTION 

The first object of the present invention, In consid- 
eration of the above mentioned problems, is to provide 
a compiler and a processor for processing loops at high 
speed, without affected by a dependency between a 
branch instruction and the pre-branch or branch target 
instruction in loops in source programs, and without 
generating any branch hazard. 

The second object of the present invention is to pro- 
vide a compiler and a processor for processing loops at 
high speed, without a necessity of computing a branch 
target address each time the program returns to the en- 
try of the loop. 

The third object of the present invention is to provide 
a compiler and a processor for processing loops at high 
speed, without a necessity of judging whether high- 
speed loop processing can be applied to a loop process 
each time a branch instruction is executed. 

The above objects are fulfilled by a compilerfor gen- 
erating a program containing a machine-language in- 
struction sequence by compiling a source program, 
comprising: a loop detecting unit for detecting certain 



loops which exist in the source program and extracting 
Information for specifying the loops; and a high-speed 
loop applying unit comprising: a first loop exclusive in- 
struction generating unit for generating a first loop ex- 
s elusive instruction which Indicates a succeeding instruc- 
tion is an entry of the loop and placing the first loop ex- 
clusive Instruction immediately before the entry of the 
loop in the machine-language instruction sequence; and 
a second loop exclusive instruction generating unit for 
generating second loop exclusive instructions which di- 
rect the program to branch to the entry of the loop and 
placing the second loop branch to the entry of the loop 
and placing the second loop exclusive instructions at 
places from where the program branches to the entry of 
the loop, the first loop exclusive instruction generating 
unit and the second loop exclusive instruction generat- 
ing unit operating based on the Information extracted by 
the loop detecting unit. 

The Instruction sequence output from the above 
compiler is processed by a processor, comprising: a 
pipeline comprising: a fetching unit for fetching instruc- 
tions one by one from the Instruction sequence; a de- 
coding unit for decoding the instructions fetched by the 
fetching unit; and an executing unit for executing the in- 
structions decoded by the decoding unit; a branch target 
storage unit; a registering unit for, afterthe decoding unit 
has decoded a first loop exclusive instruction, register- 
ing branch target information related to an instruction 
succeeding to the first loop exclusive instruction in the 
branch target storage unit; and a branch executing unit 
for, after the decoding unit has decoded a second loop 
exclusive Instruction, judging whether to execute a loop, 
If judges to execute, reading the branch target informa- 
tion registered in the branch target storage unit, and 
controlling the pipeline so that the program executes the 
loop using the read branch target information. 

When the machine-language Instruction sequence 
generated by the compiler is executed by the processor, 
the processor does not need to compute an address, 
nor fetch an instruction and decode the instruction to 
repeat the process of the loop. 

The processor may further comprise a clearing unit 
for, after the decoding unit has decoded a third loop ex- 
clusive Instruction, clearing the branch target informa- 
tion registered in the branch target storage unit. 

With such arrangements, since unnecessary 
branch target information is cleared, the control of the 
pipeline by the branch target executing unit is simplified 
even if multiple pieces of branch target information is 
registered in the branch target storage unit in such a 
case as multiple loop nesting. 

The branch target information may be the address 
of an instruction succeeding to the first loop exclusive 
Instruction, and the branch executing unit, if having 
judged to execute a loop, may use the address to control 
the pipeline. 

With such arrangements, since unnecessary 
branch target information is cleared, the control of the 
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pipeline by the branch target executing unit is simplified 
even if multiple pieces of branch target information is 
registered in the branch target storage unit in such a 
case as multiple loop nesting. 

The branch target information may be an address 
of an Instruction succeeding to the first loop exclusive 
instruction and a certain number of instructions suc- 
ceeding to the first loop exclusive instruction, and the 
branch executing unit, if having judged to execute a 
loop, may use the address and an address which Is ob- 
tained by performing a certain computation on an ad- 
dress specified by the second loop exclusive instruction 
to control the pipeline. 

With such arrangements, the processor, if it exe- 
cutes a loop, does not need to compute the first address 
of the loop, nor fetch an instruction sequence starting 
from the first instruction, nor compute an address for an 
instruction sequence succeeding to the instruction se- 
quence. The branch target information may be an ad- 
dress of an Instruction succeeding to the first loop ex- 
clusive instruction and a certain number of instructions 
succeeding to the first loop exclusive instruction, and 
the branch executing unit, if having judged to execute a 
loop, may use the information to control the pipeline. 

With such arrangements, the processor, if it exe- 
cutes a loop, does not need to compute the first address 
of the loop, nor fetch an instruction sequence starting 
from the first instruction, nor compute an address for an 
instruction sequence succeeding to the instruction se- 
quence. The branch target Information may be a first ad- 
dress of an Instruction succeeding to the first loop ex- 
clusive instruction, a certain number of instructions suc- 
ceeding to the first loop exclusive instruction, and a sec- 
ond address of an instruction which is to be executed 
immediately after the certain number of Instructions, and 
the branch executing unit, If having judged to execute a 
loop, may use the information to control the pipeline. 

With such arrangements, the processor, if it exe- 
cutes a loop, does not need to compute the first address 
of an instructbn succeeding to the first loop exclusive 
instruction, nor fetch a certain number of instructions 
succeeding to the first loop exclusive instruction, nor 
compute the second address of an instruction which is 
to be executed immediately after the certain number of 
instructions. 

The branch target information may be a certain 
number of instructions succeeding to the first loop ex- 
clusive instruction and an address of an instruction 
which is to be executed immediately after the certain 
number of Instructions in the branch target storage unit, 
and the branch executing unit, if having judged to exe- 
cute a loop, may use the Information and an address 
specified by the second loop exclusive instruction to 
control the pipeline. 

With such arrangements, the processor, if it exe- 
cutes a loop, does not need to compute the first address 
of an instruction succeeding to the first loop exclusive 
instruction, nor fetch a certain number of instructions 
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succeeding to the first loop exclusive instruction, nor 
compute the address for the instruction sequence which 
Is to be executed immediately after the certain number 
of instructions. 

5 The branch target information may be a certain 
number of instructions succeeding to the first loop ex- 
clusive instruction and an address of an instruction 
which is to be executed immediately after the certain 
number of instructions, and the registering unit, after the 

10 decoding unit has decoded a first loop exclusive instruc- 
tion, may use the information to control the pipeline. 

With such arrangements, the processor, if it exe- 
cutes a loop, does not need to fetch a certain number 
of instructions succeeding to the first loop exclusive in- 

15 struction, nor compute the address for the instruction se- 
quence which is to be executed immediately after the 
certain number of instructions. 

The branch target information may be an address 
of an instruction succeeding to the first loop exclusive 

20 instruction and a decoded certain number of instructions 
succeeding to the first loop exclusive instruction, and 
the branch executing unit, if having judged to execute a 
loop, may use the information and an address obtained 
by performing a certain computation on the address 

2S specified by the second loop exclusive instruction to 
control the pipeline. 

With such arrangements, the processor, if it exe- 
cutes a loop, does not need to compute the first address 
of an instruction succeeding to the first loop exclusive 

30 instruction, nor fetch and decode a certain number of 
instructions succeeding to the first loop exclusive in- 
struction, nor compute the address for the instruction se- 
quence which is to be executed immediately after the 
certain number of instructions. 

35 The branch target information may be an address 
of an instruction succeeding to the first loop exclusive 
instruction and a decoded certain number of instructions 
succeeding to the first loop exclusive instruction, and 
the branch executing unit. If having judged to execute a 

^0 loop, may use the information to control the pipeline. 

With such arrangements, the processor, if it exe- 
cutes a loop, does not need to compute the first address 
of an instruction succeeding to the first loop exclusive 
instruction, nor fetch and decode a certain number of 

^ instructions succeeding to the first loop exclusive in- 
struction. 

The branch target information may be a first ad- 
dress of an instruction succeeding to the first loop ex- 
clusive instruction, a decoded certain number of instruc- 
so tlons succeeding to the first loop exclusive instruction, 
and a second address of an instruction which is to be 
executed Immediately after the certain number of In- 
structions, and the branch executing unit, if having 
judged to execute a loop, may use the infonnation to 
control the pipeline. 

With such arrangements, the processor, if it exe- 
' cutes a loop, does not need to compute the first address 
of an instruction succeeding to the first loop exclusive 
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instruction, nor fetcli and decode a certain number of 
instructions succeeding to the first loop exclusive in- 
struction, nor compute the address for the instruction se- 
quence which is to be executed immediately after the 
certain number of instructions. 

The branch target information may be a decoded 
certain number of instructions succeeding to the first 
loop exclusive instruction and an address of an instruc- 
tion to be executed immediately after the first loop ex- 
clusive instruction, and the branch executing unit, if hav- 
ing judged to execute a loop, may use the information 
to control the pipeline. 

With such arrangements, the processor, if it exe- 
cutes a loop, does not need to compute the first address 
of an instruction succeeding to the first loop exclusive 
instruction, nor fetch and decode a certain number of 
instructions succeeding to the first loop exclusive in- 
struction, nor compute the address for the instruction se- 
quence which is to be executed immediately after the 
certain number of instructions. 

The branch target information may be a decoded 
certain number of instructions succeeding to the first 
loop exclusive instruction and an address of an instruc- 
tion to be executed immediately after the first loop ex- 
clusive instruction, and the branch executing unit, if hav- 
ing judged to execute a loop, may use the Information 
to control the pipeline. 

With such arrangements, the processor, if it exe- 
cutes a loop, does not need to fetch and decode a cer- 
tain number of instructions succeeding to the first loop 
exclusive instruction, nor compute the address for the 
Instruction sequence which Is to be executed immedi- 
ately after the certain number of instructions. 

As apparent from the above description, the present 
apparatus generates a high-speed loop instruction 
when compiling a source program, and registers the first 
address of a loop in an exclusive buffer in the processor, 
during the execution of the first repetition in the loop. 
After decoding a branch instruction designating a loop, 
the processor can obtain the address from the buffer. 

Consequently, the present apparatus does not need 
to invalidate the pipeline, nor compute the branch ad- 
dress, nor fetch the branch target instruction from a low- 
speed external memory for each repetition of the proc- 
ess of the loop, and repeats the process of the loop at 
high-speed. 

Also, since the present apparatus includes high- 
speed loop instructions and an unit for executing the in- 
structions that are unique to the apparatus, the appara- 
tus can register branch target instructions in the exclu- 
sive buffer independent from the operation of branch in- 
structions. Furthermore, the number of loops does not 
need to be known even before the loop is executed be- 
cause a loop exclusive instruction of the apparatus judg- 
es whether to execute a loop each time a loop ends, and 
the instructions or addresses registered in the exclusive 
buffer is used only when it is judged to execute a loop. 

As a result, the present apparatus is effective in 
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processing loops at high speed because the apparatus 
is independent from the contents of the loops or the 
number of repetitions in the loops. 

S BRIEF DESCRIPTION OF THE DRAWINGS 

These and other objects, advantages and features 
of the invention will become apparent from the following 
description thereof taken in conjunction with the accom- 
panying drawings which illustrate a specific embodi- 
ment of the invention. In the drawings: 

Fig.1 shows a source program in which an addition 
and a multiplication between two Integers are repeated 
three times each. 

Fig.2 shows a machine-language instruction se- 
quence obtained by compiling the source program using 
a prior-art compiler. 

Fig.3 shows a flow of a pipeline formed when a pri- 
or-art processor executes the instruction sequence of 
Fig.2. 

Fig.4 is a block diagram showing a construction of 
a data processing apparatus used in First to Eleventh 
Embodiments of the present invention. 

Fig.5 is a flowchart of an operational procedure of 
compiler 102 used in the embodiments of the present 
invention. 

Fig.6 is a flowchart of an operational procedure of 
processor 107 used in the embodiments of the present 
invention. 

Fig.7 shows machine-language instruction se- 
quence 106 used in the embodiments of the present in- 
vention. 

Fig.8 shows a flow of a pipeline formed when proc- 
essor 107 executes the instruction sequence of Fig.7 in 
First Embodiment. 

Fig.9 is a table showing the pipeline at clock cycles 
8 to 10 using symbols. 

Fig. 1 0 shows a flow of the pipeline formed when the 
data processing apparatus executes loop exclusive 
branching instruction "Ice" in Second Embodiment. 

Fig. 1 1 shows a flow of the pipeline formed when the 
data processing apparatus executes loop exclusive 
branching instruction "Ice" in Third Embodiment. 

Fig. 1 2 shows a flow of the pipeline formed when the 
data processing apparatus executes loop exclusive 
branching instruction Ice" in Fourth Embodiment. 

Fig. 1 3 shows a flow of the pipeline formed when the 
data processing apparatus executes loop exclusive 
branching instruction "Ice" in Fifth Embodiment. 

Fig. 1 4 shows a flow of the pipeline formed when the 
data processing apparatus executes loop exclusive 
branching instruction "Ice" in Sixth Embodiment. 

Fig. 1 5 shows a flow of the pipeline formed when the 
data processing apparatus executes loop exclusive 
branching instruction "Ice" in Seventh Embodiment 

Fig. 1 6 shows a flow of the pipeline formed when the 
data processing apparatus executes loop exclusive 
branching instruction 'Ice" in Eighth Embodiment. 
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Fig. 1 7 shows a flow of the pipeline formed when the 
data processing apparatus executes loop exclusive 
branching instruction "Ice" in Ninth Enribodiment. 

Fig. 1 8 shows a flow of the pipeline formed when the 
data processing apparatus executes loop exclusive s 
branching instruction "Ice" in Tenth Embodiment. 

Fig. 1 9 shows a flow of the pipeline formed when the 
data processing apparatus executes loop exclusive 
branching instruction "Ice" in Eleventh Embodiment. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

Preferred embodiments of the present invention are 
explained below with reference to the drawings. 

<First Embodiment> 

<Construction> 

Fig.4 is a block diagram showing a construction of 
a data processing apparatus of First Embodiment of the 
present invention. Note that Fig.4 also shows source 
program 101 to be processed by the present apparatus 
and machine-language instruction sequence 106 gen- 
erated by the present apparatus intermediately. 

The present apparatus is divided into compiler 102 
and processor 1 07 as a whole. 

Compiler 102 compiles source program 101 written 
In a high-level language into a machine-language in- 
struction sequence and outputs the sequence as ma- 
chine-language instruction sequence 106. Compiler 
102 comprises loop detecting unit 103, loop storage unit 
104, and high-speed loop applying unit 105. 

Loop detecting unit 103 detects all loops in a given 
source program that satisfies certain conditions, stores 
positional information for specifying the entries and exits 
of the detected loops as loop information into loop-stor- 
age unit 104. Note that a term "loop" used in this docu- 
ment indicates such a loop as having one entry and one 
or more exits and having no possibility to overlap with 
another loop. Also note that the entry of a loop is a place 
where the first executable Instruction of the loop is writ- 
ten; and the exit of a loop a place where the last execut- 
able instruction in a repetition in a loop is written. 

Loop detecting unit 103 detects loops after check- 
ing whether certain instructions such as "do" or "while" 
Instruction are written in the source program. Alterna- 
tively, loop detecting unit 103 may detect loops by ana- 
lyzing the flow of the control (see for example, Alfred V. 
Aho, Ravi Setchi, and Jeffrey D. Ullman, Compilers Prin- 
ciples, Techniques, and Tools, Addion-Wesley Publish- 
ing Company, 1985.) Note that "all loops in a given 
source program that satisfies certain conditions" are the 
following loops: 

(a) Independent loops which are not nested in other 
loops or do not have any nested loops in them- 
selves. 



(b) Up to two nested loops from innermost when one 
or more loops are nested in a loop. 

The reason why only two loops are allowed in (b) is 
that processes of loops are limited by the capacity of 
branch target storage unit 114 (explained later). 

Loop storage unit 104 comprises a RAM and other 
components and temporarily stores the loop information 
sent from loop detecting unit 1 03 for each loop. The loop 
information comprises positional Information for speci- 
fying statements in the source program that correspond 
to entries and exits of the loops. 

High-speed loop applying unit 105 generates three 
types of machine-language instructions (hereinafter 
high-speed loop instructions) for high-speed loop 
processing. The three types of high-speed loop Instruc- 
tions are as follows: 

Branch target registering Instruction which is written 
at a place immediately before the entry of a loop; 
Branch target clearing instruction which is written 
immediately after an exit of a loop; and 
Loop exclusive branching instruction which is writ- 
ten at each place to branch to the entry of a loop. 
'A place Immediately before the entry of a loop" has 
an instruction which is executed after the program 
leaves the loop. "Each place to branch to the entry 
of a loop" is each place from where the program 
may return to the entry of the loop to repeat the proc- 
ess. 

Note that all the statements in the source program 
other than those of loops are also compiled by compiler 
102 to machine- language instructions. These process- 
es are not explained here because they are the same 
as those by general compilers. 

Processor 107 receives machine-language instruc- 
tion sequence 106 from compiler 102, then fetches the 
instructions one by one from the sequence to decode 
and execute the instructions. Processor 1 07 comprises 
Instruction fetching unit 108, instruction decoding unit 
109, executing unit 110, branch target registering unit 
111, branch target clearing unit 112, branch executing 
unit 113, and branch target storage unit 114. Each com- 
ponent of processor 107 operates synchronously with a 
clock signal from a clock generator which is not shown 
In the-figures. A pipeline comprises instruction fetching 
unit 108, instruction decoding unit 109, and executing 
unit 110, and each unit sequentially transfers an instruc- 
tion to another unit in the direction shown in the figure, 
synchronizing with the clock signal. 

Instruction fetching unit 108 fetches an Instruction 
at one clock cycle from machine-language instruction 
sequence 106 which is stored in an external memory 
(not shown in the figures), then sends the instruction to 
instruction decoding unit 109 at the next clock cycle. In- 
struction fetching unit 1 08 comprises fetch counter 1 08a 
and fetch Instruction buffer 108b. 
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Fetch counter 1 08a designates an address of an In- 
struction to be fetched next, and outputs the address to 
the external memory which stores nnachtne-language 
instruction sequence 106. After the instruction is 
fetched, the value of fetch counter 108a is incremented 
by one by incrementing device which is not shown in the 
figures, and updated to an address to be fetched next. 
However, if a direction comes from branch executing 
unit 11 3, the direction takes priority over the above proc- 
ess, then the value of fetch counter 108a is updated to 
an address sent from branch executing unit 113. 

Fetch instruction buffer 108b comprises a register 
and stores an instruction fetched from machine-lan- 
guage instruction sequence 106. 

Instruction decoding unit 109 comprises decode 
counter 109a and decode instruction buffer 109b. and 
decodes instructions stored in decode instruction buffer 
109b. When Instruction decoding unit 109 judges that 
the instruction stored in decode instruction buffer 109b 
is any of high<speed loop instructions, instruction decod- 
ing unit 109 activates any of branch target registering 
unit 11 1 , branch target clearing unit 1 1 2, and branch ex- 
ecuting unit 1 1 3 according to the stored high-speed loop 
instruction. On the other hand, when it is judged that the 
Instruction stored in decode instruction buffer 109b is an 
instruction other than high-speed loop instructions, in- 
struction decoding unit 109 decodes the instruction and 
sends the decoded instruction to executing unit 110. 
Note that instructions decoded by instruction decoding 
unit 109 are also called micro Instructions. 

Decode counter 109a stores an address in the ex- 
ternal memory of the instruction being now stored in de- 
code instruction buffer 109b. Generally, decode counter 
109a stores an address sent from fetch counter 108a. 
However, If a direction comes from branch executing 
unit 113, the value of decode counter 109a is updated 
to an address sent from branch executing unit 113. 

Decode instruction buffer 109b stores an instructton 
sent from fetch instruction buffer 108b or branch exe- 
cuting unit 1 1 3. 

Executing unit 1 1 0 comprises execute counter 1 1 0a 
and execution controlling unit 110b. 

Execute counter 110a stores an address of the mi- 
cro instruction executed by executing unit 110 in the ex- 
ternal memory. Generally, execute counter 110a stores 
an address sent from decode counter 109a. However, 
if a direction comes from branch executing unit 113, the 
value of execute counter 110a is updated to an address 
sent from branch executing unit 113. 

Execution controlling unit 11 Ob comprises an Arith- 
metic Logic Unit (ALU) and a shifter, controls the com- 
ponents of processor 107 according to a micro instruc- 
tion sent from instruction decoding unit 109 or branch 
executing unit 1 1 3, and inputs or outputs control signals 
connected to processor 107, not shown in the figures. 

Branch target registering unit 111 Is activated by in- 
struction decoding unit 109 when the unit 109 judges 
that an Instruction stored in decode instruction buffer 



109b is a branch target registering instruction. The ac- 
tivated branch target registering unit 111 reads an ad- 
dress in decode counter 109a and an Instruction in de- 
code instruction buffer 1 09b at the next clock cycle, and 

5 registers them in branch target storage unit 114 as 
branch target information. 

Branch target clearing unit 112 is activated by in- 
struction decoding unit 109 when the unit 109 judges 
that an instruction stored in decode instruction buffer 

10 1 09b is a branch target clearing instruction. The activat- 
ed branch target clearing unit 1 1 2 clears a pair of pieces 
of branch target Information registered in branch target 
storage unit 114. 

Branch executing unit 1 1 3 is activated by instruction 

15 decoding unit 109 when the unit 109 judges that an in- 
struction stored in decode instruction buffer 109b is a 
loop exclusive branching instruction. The activated 
branch executing unit 113 reads a pair of pieces of 
branch target information registered in branch target 

20 storage unit 114, stores the address Into fetch counter 
108a, then at the next clock cycle, stores the read In- 
struction into decode instruction buffer 109b. 

Branch target storage unit 114 comprises Last In 
First Out (LIFO) latches, and has a capacity of two pairs 

25 of pieces of branch target Information at maximum. The 
branch target information is written on the unit 114 by 
branch target registering unit 111 and cleared by branch 
target clearing unit 112. 

30 <Operatk)n> 

Now, how the data processing apparatus with the 
above construction operates is explained. Figs. 5 and 6 
are flowcharts of compiler 1 02 and processor 1 07 of the 

3S present apparatus. 

Suppose a source program shown in Fig.1 Is input 
to compiler 102. 

When the source program is input to compiler 102 
(step S201), loop detecting unit 103 detects loops in- 

40 eluded in the source program (step S202). In the source 
program of Fig.l, a "for" statement designates a loop. 
Therefore, loop detecting unit 103 stores positional in- 
formation specifying the entry and exit of the toop into 
loop storage unit 104 (step S203). 

45 High-speed loop applying unit 1 05 reads loop infor- 
mation stored in loop information storage unit 104, and 
outputs branch target registering instruction "set" to the 
place immediately before the entry of the loop (step 
S204); branch target clearing instruction "cir" immedi- 

so ately after the exit (step S205); and toop exclusive 
branching instruction "Ice L" to places to branch to the 
entry of the loop (step S206). Note that compiler 102 
outputs general execution Instructions for the other 
parts of the source program. 

55 Machine-language instruction sequence 106 
shown in Fig.7 output by compiler 1 02 and the machine- 
language instruction sequence shown in Flg.2 output by 
the prior-art apparatus are results of compiling the same 
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source program. As understood from the comparison 
between the two figures, the machine-language Instruc- 
tion sequence of Fig.7 additionally has branch target 
registering instruction "set" and branch target clearing 
instruction "clr", and branching instruction "bcc L" in Fig. 
2 is replaced by loop exclusive branching instruction 'Ice 
L". 

Machine-language Instruction sequence 1 06 Is out- 
put to the external menrK>ry and is executed by proces- 
sor 107 (steps S301 to S309), 

Instruction fetching unit 108 fetches instructions 
one by one from machine-language instruction se- 
quence 106 (step S301). 

The fetched instruction is transferred to instruction 
decoding unit 109 and decoded at the next clock cycle 
(steps S302 to S307). When the unit 109 judges as a 
result of the decoding that the Instruction is any of high- 
speed loop instructions "set", "cir", and "Ice", a process 
according to the instruction Is performed (steps S303, 
S305, and S307). If the Instruction Is not any of high- 
speed loop instructions, the decoded instruction is 
transferred to executing unit 110 at the next clock cycle 
and executed (step S308). 

Steps S301 to S308 are repeated until the last in- 
struction of machine-language Instruction sequence 
106 has been executed (step S309). 

Now, it is explained in detail on how each compo- 
nent of processor 107 operates when high-speed loop 
Instruction "set", "cIr", or "Ice" is executed. 

Flg.8 shows how units 108 to 110 of a pipeline In 
processor 107 operate when the machine-language in- 
struction sequence shown in Fig.7 Is executed. Col- 
umns of "IF", "DEC", and "EX" in Fig.8, show instruc- 
tions fetched by instruction fetching unit 108, instruction 
decoding unit 109. and executing unit 110 respectively. 

Branch target registering instruction "set" fetched at 
clock cycle 2 is transferred to decode Instruction buffer 
109b at clock cycle 3. Then, instruction decoding unit 
109 recognizes the instruction transferred to decode in- 
struction buffer 109b as branch target registering in- 
struction "set", and activates branch target registering 
unit 111. 

The activated branch target registering unit 111 
reads an address in decode counter 109a and an in- 
struction in decode instruction buffer 109b at the next 
clock cycle, and registers them in branch target storage 
unit 114 as branch target Information. In this case, the 
Instruction Is "add a,b,c", and the address is an address 
of the instruction in the external memory. Note that since 
branch target storage unit 114 has a capacity of up to 
two pairs of pieces of branch target information, after 
this pair of pieces of Information have been registered, 
unit 114 has a remaining capacity of one more pair of 
pieces of branch target information. 

Loop exclusive branching Instruction "Ice" fetched 
at clock cycle 7 is transferred to decode instruction buff- 
er 109b at clock cycle 8. Then, Instruction decoding unit 
109 recognizes the Instruction transferred to decode in- 



struction buffer 109b as loop exclusive branching in- 
struction "Ice", and activates branch executing unit 113. 
The activated branch executing unit 1 1 3 transfers a val- 
ue to fetch counter 1 08a. This value is obtained by add- 

5 ing "4", representing a size of instruction "add a,b,c", to 
an address registered in branch target storage unit 11 4. 
The value Is equal to an address of an Instruction stored 
after Instruction "add a.b.c". Branch executing unit 113 
then transfers an address and an Instruction registered 

10 in branch target storage unit 1 1 4 to decode counter 1 09a 
and decode instruction buffer 109b respectively. 

As a result, at clock cycle 9, instruction fetching unit 
108 fetches instruction "mul a,b,d' from the external 
memory, instruction decoding unit 109 decodes instruc- 

15 tlon "add a,b,c" and its address sent from branch target 
storage unit 114, and executing unit 110 executes in- 
struction "Ice L". When Instruction "Ice L" Is executed, a 
judgement the same as that of Instruction "bee L" is 
done. In this case, it is judged that the program should 

20 repeat the process, according to the result of the previ- 
ously executed instruction "cmp 1,3". As a result, the sec- 
ond repetition in the loop Is processed through clock cy- 
cles 10 to 14. 

In Fig.9. symbols "IR", "IC". "DR", "DC", "ER", and 

25 "EC" Stand for components of units 1 08 to 1 1 0 shown in 
Fig. 4, and are used to Indicate how information of ad- 
dress or instruction Is transferred. BR1 and BR2 repre- 
sent parts of branch target storage unit 114, and are 
used to indicate respectively the address of instruction 

30 'add a,b,c" and instruction 'add a.b.c" In this ease. Op- 
erations related to loop exclusive branching instruction 
"Ice" are characteristic of the present apparatus, and are 
encircled by broken lines. 

Symbols used at cbck cycle 8 indicate as follows: 

35 

<Stage IF> 

IR <r- (IC): An instruction stored in the extemal mem- 
ory at an address stored in IC is transferred to IR. 

40 IC <- BR1 + 4: An address stored in BR1 is added 
by 4 and is transferred to IC. 

In Fig.9, the above transfers are described In two 
rows. It Indicates that the transfer in the upper row is 
operated inthefirsthalf of one clock cycle, andthetrans- 

^ fer in the lower row in the latter half. In this case, transfer 
IR (IC) is operated first, then IC 4- BR1 + 4 is oper- 
ated. 

<Stage DEC> 

so 

DR <- IR: An instruction stored In IR Is transferred 
to DR. 

DC IC: An address stored In IC is transferred to 

DC. 

55 In Fig.9, the above transfers, DR <- IR and DC 
IC, are described in a row, indicating that the transfers 
are operated at the same time. 



8 



15 

<Stage EX> 

ER 4- DR: An instruction stored in DR is transferred 
to ER. 

EC 4- DC: An address stored in DC is transferred 
to EC. 

The above described operations of the pipeline at 
clock cycle 8 is the same as those at clock cycles 13 
and 18. 

When the third repetition in the loop is executed at 
clock cycles 15 to 18, the program judges, when exe- 
cuting instruction "Ice L" at clock cycle 19, that the loop 
should be ended, then fetches branch target clearing in- 
struction "clr" placed immediately after the exit of the 
loop at clock cycle 20. The fetched branch target clear- 
ing instruction "clr" is transferred to decode instruction 
buffer 109 at the next clock cycle 21 . Decode instruction 
buffer 109 recognizes the transferred instruction as 
branch target clearing instruction "clr", and activates 
branch target clearing unit 112. 

The activated branch target clearing unit 112 clears 
instruction "add a,b,c" and its address registered in 
branch target storage unit 114. Note that branch target 
storage unit 114 resumes the capacity of two pairs of 
pieces of branch target information after the above op- 
eration. 

As shown In the above description, the present ap- 
paratus generates high-speed loop instructions when 
compiling source programs, and the first instruction of 
the loop and its address are registered in branch target 
storage unit 114, an exclusive buffer In processor 107. 
After decoding a branch instruction which designates a 
loop, processor 1 07 does not need to compute a branch 
target address or fetch a branch target instruction from 
a low-speed external memory. Instead, processor 107 
can obtain the branch target address and branch target 
Instruction from branch target storage unit 114. 

Consequently, it is apparent from a comparison be- 
tween the pipelines of Fig.3and Fig.8 that a branch haz- 
ard occurs in the conventional apparatus, but not in the 
present apparatus when a same program including a 
loop is compiled and executed. This indicates that the 
present apparatus has Increased speed In loop process- 
ing. 

<Second Embodiment> 

Now, the data processing apparatus of Second Em- 
bodiment of the present invention is explained. The ap- 
paratus achieves high-speed loop processing by regis- 
tering only the first address of a loop in an exclusive buff- 
er. 

In First Embodiment, the first instruction of a loop 
and its address are registered in branch target storage 
unit 114, which is an exclusive buffer in processor 107, 
while in Second Embodiment, only the address of the 
first Instruction is registered. 
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<Constructlon> 

The construction of the data processing apparatus 
of the present embodiment is the same as that of First 
s Embodiment shown in Fig.4. However, note that branch 
target storage unit 114 stores only the address of the 
first instruction of a loop. 

<Operation> 

10 

The operation of the present apparatus Is the same 
as that of First Embodiment shown in Figs.5 and 6 ex- 
cept that branch target registering instruction "set" and 
loop exclusive branching instruction "Ice" generate op- 
'5 erations different from those of First Embodiment when 
executed. These points are explained below. 

After branch target registering instruction "set" is 
transferred to decode instruction buffer 109b and 
branch target registering unit 111 is activated, the acti- 
vated branch target registering unit 111 reads an ad- 
dress transferred to decode counter 109a at the next 
clock cycle, that Is, an address of instruction "add a,b, 
c" in the external memory, and registers the address in 
branch target storage unit 114. 

On the other hand, after loop exclusive branching 
instruction "Ice" is transferred to decode Instruction buff- 
er 109b and branch executing unit 113 is activated, the 
activated branch executing unit 113 transfers the ad- 
dress registered in branch target storage unit 114 to 
fetch counter 108a. Then, an instruction, namely "add 
a,b,c", Is fetched from the external memory by referring 
to fetch counter 108a for its address. 

Fig. 10 shows an operational flow In the pipeline at 
an execution of loop exclusive branching instruction 
"Ice". Fig. 10 corresponds to Fig.9 in First Embodiment. 

In stage IF at clock cycle 8, an instruction stored in 
the external memory at an address stored in IC is trans- 
ferred to I R first, then, an address stored In BR 1 is trans- 
ferred to IC. Also, since a pipeline stall over one clock 
cycle occurs in the present embodiment as shown in Fig. 
10, branch executing unit 113 transfers 0 to DR and DC 
in stage DEC at clock cycle 9 to Invalidate the instruction 
in the stall. 

As apparent from the above description, the present 
apparatus generates a high-speed loop instruction 
when compiling a source program, and registers the first 
address of a loop in branch target storage unit 114, an 
exclusive buffer In processor 107, during the execution 
of the first repetition in the loop. After decoding a branch 
Instruction designating a loop, processor 107 does not 
need to compute branch target address and obtains the 
address from branch target storage unit 114. 

As a result, it is apparent that the apparatus of Sec- 
ond Embodiment has reduced a branch hazard by one 
clock cycle in a pipeline compared with the conventional 
apparatus, increasing the speed of the loop processing. 
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<Third Embodiment> 

The data processing apparatus of Third Embodi- 
ment of the present invention is explained. The appara- 
tus achieves high-speed loop processing by registering 
the first instruction of a loop and its address in an exclu- 
sive buffer and by using an address specified by a loop 
exclusive branching instruction. 

<Construction> 

The construction of the data processing apparatus 
of the present embodiment is the same as that of First 
Embodiment shown in Fig.4. 

<Operation> 

The operation of the present apparatus is the same 
as that of First Embodiment shown in Figs.5 and 6 ex- 
cept that loop exclusive branching instruction "Ice" gen- 
erates an operation different from that of First Embodi- 
ment when executed. This is explained below. 

After loop exclusive branching instruction "ice" is 
transferred to decode instruction buffer 109b (DR) and 
branch executing unit 113 is activated, the activated 
branch executing unit 113 transfers an address speci- 
fied by instruction "Ice L°, namely a value of label "L" 
added by 4, to fetch counter 1 08a (IC). Then, at the next 
clock cycle, branch executing unit 113 transfers the ad- 
dress and instruction "add a,b,c' registered In branch 
target storage unit 114 to decode instruction counter 
109a (DC) and decode instruction buffer 109b (DR) re- 
spectively. 

Fig. 11 shows an operational flow in the pipeline at 
an execution of loop exclusive branching instruction 
"Ice". Fig. 11 corresponds to Fig.9 in First Embodiment. 
"BR2" is used to indicate a branch target instruction, 
namely "add a,b,c" in this case; and "BRI" its address. 
Both values are registered in branch target storage unit 
114. Also, "«DR»", stands for a part of an instruction 
stored in DR, and is used to Indicate an address which 
is equal to a value obtained by adding 4 to label "L", of 
the part, the address being that of the second instruction 
of the loop. 

In stage IF at clock cycle 6, an instruction stored in 
IC Is fetched first, then an address specified by the part 
of Instruction stored in DR, a value of label "L" added by 
4, is transferred to IC. Then, in DEC stage at clock cycle 
9, an address stored in BR1 and an instruction stored 
In BR2 are transferred from branch target storage unit 
114 to DC and DR. In the succeeding clock cycles, the 
program operates the same as First Embodiment. 

As apparent from the above description, the present 
apparatus generates a high-speed loop instruction 
when a loop is detected during a compilation of a source 
program. The apparatus registers the first Instruction of 
the loop and its address in branch target storage unit 
1 1 4, which is an exclusive buffer in processor 1 07, when 



the first repetition in the loop is executed. After decoding 
a branch instruction designating a loop, processor 107 
does not need to compute a branch target address nor 
fetch a branch target instruction from a low-speed ex- 

s ternal memory, and obtains the instruction and its ad- 
dress from branch target storage unit 114 and also ob- 
tains an address of the second instruction from a part of 
a loop exclusive branching instruction. 

As a result, it is apparent that the apparatus of Third 

10 Embodiment has deleted the branch hazard that occurs 
in the conventional apparatus, increasing the speed of 
the loop processing. 

<Fourth Embodiment> 

IS 

The data processing apparatus of Fourth Embodi- 
ment of the present invention is explained. The appara- 
tus achieves high-speed loop processing by registering 
the loop's first instruction and its address and the ad- 
20 dress of the second instruction in an exclusive buffer, 
branch target storage unit 114. 

<Construction> 

2S The construction of the data processing apparatus 

of the present embodiment is the same as that of First 
Embodiment shown in Fig.4, except that branch target 
storage unit 114 stores the address of the second in- 
struction as well as the first instruction and its address. 

30 

<Operation> 

The operation of the present apparatus is the same 
as that of First Embodiment shown in Figs.5 and 6 ex- 
3S cept that branch target registering instruction "set" and 
loop exclusive branching instruction "Ice" generate op- 
erations different from those of First Embodiment when 
executed. These points are explained below. 

After branch target registering instruction 'set" is 
40 transferred to decode instruction buffer 109b (DR) and 
branch target registering unit 111 is activated, the unit 
111, at the next clock cycle, reads an address trans- 
ferred to decode counter 1 09a and an instruction trans- 
ferred to decode instruction buffer 109b, which is "add 
4S a,b,c", and registers the address, the instruction, and 
the address added by 4 In branch target storage unit 
114. 

On the other hand, after loop exclusive branching 
instruction "Ice" is transferred to decode instruction buff- 
so er 1 09b and branch executing unit 1 1 3 is activated, the 
activated branch executing unit 113 transfers the ad- 
dress of the second instruction registered in branch tar- 
get storage unit 114 to fetch counter 108a. Then, branch 
executing unit 113 transfers instruction "add a,b,c" and 
ss its address to decode instruction buffer 109b and de- 
code instruction counter 109a respectively. 

Fig. 12 shows an operational flow in the pipeline at 
an execution of loop exclusive branching instruction 
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"Ice". Fig. 12 corresponds to Fig. 9 in First Embodiment. 
"BRS" Is used to indicate the address of the second in- 
struction registered in branch target storage unit 114. 

In stage IF at clock cycie 8, an instruction stored In 
the external memory at an address stored In IC is trans- 
ferred to I R first, then, the address stored In BR3 is trans- 
ferred to IC. Then, in stage DEC at clock cycle 9, the 
instruction stored in BR2 and the address stored in BR1 
are transferred from branch target storage unit 114 to 
DR and DC respectively. In the succeeding clockcycles, 
the program operates the same as First Embodiment. 

As apparent from the above description, the present 
apparatus generates a high-speed loop instruction 
when compiling a source program, and registers the first 
address, the first instruction, and the second address of 
a loop in branch target storage unit 114 during the exe- 
cution of the first repetition in the loop. After decoding a 
branch instruction designating a loop, processor 107 
does not need to compute branch target address and 
the next address, and obtains the addresses from 
branch target storage unit 114. 

As a result, the apparatus of Fourth Embodiment 
has deleted the branch hazard that occurs in the con- 
ventional apparatus, Increasing the speed of the loop 
processing. 

<Fifth Embodiment> 

The data processing apparatus of Fifth Embodi- 
ment of the present Invention is explained. The appara- 
tus achieves high-speed loop processing by registering 
the loop's first Instruction and the second address In 
branch target storage unit 114 and by using a loop ex- 
clusive branching instruction for specifying the first ad- 
dress. 

<Gonstruction> 

The construction of the data processing apparatus 
of the present embodiment is the same as that of First 
Embodiment shown in Flg.4, except that branch target 
storage unit 114 stores the first instruction and the ad- 
dress of the second instruction. 

<Operation> 

The operation of the present apparatus is the same 
as that of First Embodiment shown in Figs.5 and 6 ex- 
cept that branch target registering Instruction "set" and 
loop exclusive branching instruction 'Ice' generate op- 
erations different from those of First Embodiment when 
executed. These points are explained below. 

After branch target registering Instruction "set" Is 
transferred to decode instruction buffer 109b (DR) and 
branch target registering unit 111 is activated, the unit 
111, at the next clock cycle, reads an address trans- 
ferred to decode counter 109a and an instruction trans- 
ferred to decode Instruction buffer 109b, and registers 



the second address, which is obtained by adding 4 to 
the transferred address, and the instruction 'add a.b.c" 
in branch target storage unit 114. 

On the other hand, after loop exclusive branching 

s instruction "Ice" is transferred to decode instruction buff- 
er 109b and branch executing unit 113 is activated, the 
activated branch executing unit 113 transfers the ad- 
dress of the second instruction registered in branch tar- 
get storage unit 1 1 4 to fetch counter 1 08a. Then, at the 

10 next clock cycle, branch executing unit 1 1 3 transfers the 
address specified by instruction "Ice", being equal to the 
value of label "L", from decode Instruction buffer 109b 
to decode counter 109a first, then transfers instruction 
"add a,b,c'' from unit 114 to decode buffer 109b. 

IS Fig.1 3 shows an operational flow in the pipeline at 
an execution of loop exclusive branching instruction 
"Ice". Fig. 13 corresponds to Fig. 9 in First Embodiment. 
"BR2" and "BR3° are used to indicate the instruction, 
"add a.b.c". and the address of the second instruction 

20 registered in branch target storage unit 11 4 respectively. 
In stage IF at clock cycle 8, an instruction stored in 
the external memory at an address stored in I C is trans- 
ferred to IR first, then, the address stored in BR3 is trans- 
ferred to IC. Then, in stage DEC at clock cycle 9, a part 

25 of the instruction stored in DR is transferred to DC first, 
then the instruction stored in BR2 is transferred from 
branch target storage unit 114 to DR. In the succeeding 
clock cycles, the program operates the same as First 
Embodiment. 

30 As apparent from the above description, the present 
apparatus generates a high-speed loop instruction 
when compiling a source program, and registers the first 
address, the first instruction, and the second address of 
a loop in branch target storage unit 114 during the exe- 

3S cutlon of the first repetition in the loop. After decoding a 
branch instruction designating a loop, processor 107 
does not need to compute branch target address and 
the next address, nor fetch the branch target instruction 
from a low-speed external memory, and obtains the ad- 

40 dresses and instruction from branch target storage unit 
114 and loop exclusive branching instruction 'Ice L". 

As a result, the apparatus of Fifth Embodiment has 
deleted the branch hazard that occurs In the conven- 
tional apparatus, increasing the speed of the loop 

45 processing. 

<Sixth Embodlment> 

The data processing apparatus of Sixth Embodl- 
50 nrient of the present invention is explained. The appara- 
tus achieves high-speed loop processing by registering 
the loop's first instruction and the second address in 
branch target storage unit 1 1 4 and by computing the first 
address with a certain expression. 

55 

<Construction> 

The construction of the data processing apparatus 
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of the present embodiment is the same as that of First 
Embodiment shown in Fig.4, except that branch target 
storage unit 114 stores the first instruction and the ad- 
dress of the second instruction. 

<Operation> 

The operation of the present apparatus is the same 
as that of First Embodiment shown in Figs.5 and 6 ex- 
cept that branch target registering instruction "set" and 
loop exclusive branching Instruction "Ice" generate op- 
erations different from those of First Embodiment when 
executed. These points are explained below. 

After branch target registering instruction "set" is 
transferred to decode Instruction buffer 109b (DR) and 
branch target registering unit 111 Is activated, the unit 
111, at the next clock cycle, reads an address trans- 
ferred to decode counter 109a and an instruction trans- 
ferred to decode instruction buffer lOQb, and registers 
the second address, which is made by adding 4 to the 
transferred address, and the instruction "add a,b,c' in 
branch target storage unit 114. 

On the other hand, after loop exclusive branching 
instruction "Ice" is transferred to decode instruction buff- 
er 109b and branch executing unit 11 3 is activated, the 
activated branch executing unit 113 transfers the ad- 
dress of the second instruction registered in branch tar- 
get storage unit 114 to fetch counter 108a. Then, at the 
next clock cycle, branch executing unit 113 reads the 
address and Instruction registered in branch target stor- 
age unit 114, adds 4 to the address, then transfers the 
result address and the instruction, "add a,b,c" from unit 
114 to decode buffer 109b. 

Fig. 14 shows an operational flow in the pipeline at 
an execution of loop exclusive branching instruction 
"Ice". Fig. 14 corresponds to Fig. 9 in First Embodiment. 
"BR2" and "BR3" are used to respectively indicate the 
instruction, "add a,b,c", and the address of the second 
instruction registered in branch target storage unit 114. 

In stage IF at clock cycle 8, an instruction stored in 
the external memory at an address stored in IC is trans- 
ferred to I R first, then, the address stored in BR3 is trans- 
ferred to IC. Then, in stage DEC at clock cycle 9, the 
instruction stored in BR2 is transferred from branch tar- 
get storage unit 114 to DR, the address stored in BR3 
is subtracted by 4, then transferred to DC. In the suc- 
ceeding clock cycles, the program operates the same 
as First Embodiment. 

As apparent from the above description, the present 
apparatus generates a high-speed loop instruction 
when compiling a source program, and registers the first 
address, the first instruction, and the second address of 
a loop in branch target storage unit 114 during the exe- 
cution of the first repetition in the loop. After decoding a 
branch instruction designating a loop, processor 107 
does not need to compute branch target address, nor 
fetch the branch target instruction from a low^speed ex- 
ternal memory, and obtains the instruction and address 



of the second address from branch target storage unit 
114. 

As a result, the apparatus of Sixth Embodiment has 
deleted the branch hazard that occurs in the conven- 
s tlonal apparatus, Increasing the speed of the loop 
processing. 

<Seventh Embodiment> 

10 The data processing apparatus of Seventh Embod- 
iment achieves high-speed loop processing by register- 
ing the decoded first address and the decoded first in- 
struction in branch target storage unit 114. 



The construction of the data processing apparatus 
of the present embodiment is the same as that of First 
Embodiment shown in Fig.4, except that branch target 
storage unit 114 stores the decoded first address and 
the decoded first instruction. 

<Operatk)n> 

The operation of the present apparatus is the same 
as that of First Embodiment shown in Figs.5 and 6 ex- 
cept that branch target registering instruction "set" and 
loop exclusive branching instruction "Ice" generate op- 
erations different from those of First Embodiment when 
executed. These points are explained below. 

After branch target registering instruction "set" is 
transferred to decode instruction buffer 109b (DR) and 
branch target registering unit 111 is activated, the unit 
111, at the next clock cycle, reads an address and the 
decoded instruction "add a,b,c' from decode counter 
109a and decode instruction buffer 109b respectively, 
and registers the read information in branch target stor- 
age unit 114. 

On the other hand, after loop exclusive branching 
instruction "Ice" is transferred to decode instruction buff- 
er 109b and branch executing unit 113 is activated, the 
activated branch executing unit 113 transfers the ad- 
dress of the second instruction specified by instruction 
"Ice L", being equal to label "L" added by 4, to fetch coun- 
ter 108a. Then, at the next clock cycle, branch executing 
unit 1 1 3 transfers "0" to decode instruction counter 1 09a 
(DC) and decode instruction buffer 109b (DR) to invali- 
date them, then at the next clock cycle, transfers the 
decoded address and decoded instruction "add a,b,c" 
from branch target storage unit 1 1 4 to execution instruc- 
tion counter 110a and execution controlling unit 110b re- 
spectively. 

Fig. 15 shows an operational flow in the pipeline at 
an execution of loop exclusive branching instruction 
"Ice". Fig. 15 corresponds to Fig.9 in First Embodiment. 
"BR1" and 'BR4' are used to respectively Indicate the 
decoded first address and the decoded first Instruction 
registered in branch target storage unit 114. 
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In stage IF at clock cycle 8, an instruction stored in 
the external memory at an address stored in IC is trans- 
ferred to IR first, then, a part of the instruction stored in 
DR is transferred to IC. Then, in stage DEC at clock cy- 
cle 9, "0" is transferred to DC and DR. In stage EX at 
clock cycle 10, the decoded instruction stored in BR4 
and the address stored in BR1 are transferred from 
branch target storage unit 114 to ER and EC respective- 
ly. In the succeeding clock cycles, the program operates 
the same as First Embodiment. 

As apparent from the above description, the present 
apparatus generates a high-speed loop instruction 
when compiling a source program, and registers the de- 
coded first address and the decoded first instruction of 
a loop in branch target storage unit 114 during the exe- 
cution of the first repetition in the loop. After decoding a 
branch instruction designating a loop, processor 107 
does not need to compute branch target address and 
the second address, nor fetch the branch target instruc- 
tion from a low-speed external memory, and obtains the 
decoded branch target instruction, the decoded branch 
target address, and the address of the second instruc- 
tion from branch target storage unit 114 and loop exclu- 
sive branching instruction "Ice L". 

As a result, the apparatus of Seventh Embodiment 
has deleted the branch hazard that occurs in the con- 
ventional apparatus, increasing the speed of the loop 
processing. 

<Eighth Embodlment> 

The data processing apparatus of Eighth Embodi- 
ment achieves high-speed loop processing by register- 
ing the first address and the decoded first instruction in 
branch target storage unit 114. 

<Construction> 

The construction of the data processing apparatus 
of the present embodiment is the same as that of First 
Embodiment shown in Fig. 4. except that branch target 
storage unit 1 14 stores the first address and the decod- 
ed first instruction. 

<Operation> 

The operation of the present apparatus is the same 
as that of First Embodiment shown in Figs.5 and 6 ex- 
cept that branch target registering instruction "set" and 
loop exclusive branching instruction "Ice" generate op- 
erations different from those of First Embodiment when 
executed. These points are explained below. 

After branch target registering instruction "set" is 
transferred to decode instruction buffer 109b (DR) and 
branch target registering unit 111 is activated, the unit 
1 1 1 , at the next clock cycle, reads the first address and 
the decoded first instruction from decode counter 109a 
and decode Instruction buffer 109b, and registers the 



read information in branch target storage unit 114. 

On the other hand, after loop exclusive branching 
instruction "Ice" is transferred to decode instruction buff- 
er 109b and branch executing unit 113 is activated, the 
s activated branch executing unit 113 adds 4 to the ad- 
dress registered in unit 114 and transfers the result to 
fetch counter 1 08a. Then, at the next clock cycle, branch 
executing unit 1 1 3 transfers "0" to decode counter 1 09a 
(DC) and decode instruction buffer lOgb (DR) to invali- 
10 date them, then at the next clock cycle, transfers the first 
address and the decoded first instruction from branch 
target storage unit 114 to execution instruction counter 
110a and execution controlling unit 110b respectively. 
Fig. 16 shows an operational flow in the pipeline at 
IS an execution of loop exclusive branching Instruction 
"Ice". Fig. 16 corresponds to Fig. 9 in First Embodiment. 
"BRI" and ''BR4" are used to respectively indicate the 
first address and the decoded first instruction registered 
in branch target storage unit 114. 
20 In stage IF at clock cycle 8, an instruction stored in 
the external memory at an address stored in I C is trans- 
ferred to IR first, then, 4 is added to the address read 
from branch target storage unit 114 and the result is 
transferred to IC. Then, in stage DEC at clock cycle 9, 
2S "0" Is transferred to DC and DR. In stage EX at clock 
cycle 1 0, the decoded instruction stored in BR4 and the 
decoded address stored in BRI are transferred from 
branch target storage unit 1 1 4 to ER and EC respective- 
ly. In the succeeding clock cycles, the program operates 
30 the same as First Embodiment. 

As apparent from the above description, the present 
apparatus generates a high-speed loop instruction 
when compiling a source program, and registers the de- 
coded first instruction and the second address of a loop 
35 in branch target storage unit 114 during the execution 
of the first repetition in the loop. After decoding a branch 
instruction designating a loop, processor 107 does not 
need to compute branch target address and the second 
address, nor fetch the branch target instruction from a 
40 low-speed external memory, and obtains the decoded 
first instruction and the second address from branch tar- 
get storage unit 114. 

As a result, the apparatus of Eighth Embodiment 
has deleted the branch hazard that occurs in the con- 
^ ventional apparatus, Increasing the speed of the loop 
processing. 

<Ninth Embodiment> 

so The data processing apparatus of Ninth Embodi- 
ment achieves high-speed loop processing by register- 
ing the decoded first address, the decoded first instruc- 
tion, and the address of the second Instruction In branch 
target storage unit 114. 

55 

<Construction> 

The construction of the data processing apparatus 
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of the present embodiment is the same as that of First 
Embodiment shown in Fig.4, except that branch target 
storage unit 114 stores the first address, the decoded 
first instruction, and the address of the second Instruc* 
tion of a loop. s 

<Operation> 

The operation of the present apparatus is the same 
as that of First Embodiment shown in Figs.5 and 6 ex- io 
cept that branch target registering instruction "set" and 
loop exclusive branching instruction "Icc" generate op- 
erations different from those of First Embodiment when 
executed. These points are explained below. 

After branch target registering instruction "set" Is is 
transferred to decode instruction buffer 109b (DR) and 
branch target registering unit 111 is activated, the unit 
1 1 1 , at the next clock cycle, reads the decoded first ad- 
dress and the decoded first instruction . and registers the 
Information and a value of the address added by 4 In 20 
branch target storage unit 114. 

On the other hand, after loop exclusive branching 
Instruction "Ice" is transferred to decode instruction buff- 
er lOgb and branch executing unit 113 Is activated, the 
activated branch executing unit 11 3 transfers the value 2s 
registered In unit 11 4 to fetch counter 108a. Then, at the 
next clock cycle, branch executing unit llStransfers "0" 
to decode counter 109a (DC) and decode instruction 
buffer 109b (DR) to invalidate them, then at the next 
clock cycle, transfers the first address and the decoded 30 
first instruction 'add a,b,c° from branch target storage 
unit 114 to execution instruction counter 110a and exe- 
cution controlling unit 110b respectively. 

Fig. 17 shows an operational flow in the pipeline at 
an execution of loop exclusive branching instruction ss 
"Icc". Fig. 17 corresponds to Fig. 9 in First Embodiment. 
"BR1 ". "BR3", and "BR4" are used to respectively indi- 
cate the first address, the address of the second instruc- 
tion, and the decoded first instruction registered In 
branch target storage unit 114. 40 

In stage IF at clock cycle 8, an instruction stored in 
the external memory at an address stored in I C Is trans- 
ferred to I R first, then , the address stored in BR3 Is trans- 
ferred to IC. Then, in stage DEC at clock cycle 9, "0" Is 
transferred to DC and DR. In stage EX at clock cycle 1 0, ^ 
the decoded first Instruction stored In BR4 and the first 
address stored in BR1 are transferred from branch tar- 
get storage unit 114 to ER and EC respectively. In the 
succeeding clock cycles, the program operates the 
same as First Embodiment. so 

As apparent from the above description, the present 
apparatus generates a high-speed loop Instruction 
when compiling a source program, and registers the first 
micro instruction, the first address, and the second ad- 
dress of a loop in branch target storage unit 114 during ss 
the execution of the first repetition In the loop. After de- 
coding a branch instructton designating a loop, proces- 
sor 107 does not need to compute branch target ad- 
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dress and the second address, nor fetch the branch tar- 
get instruction from a low-speed external memory, and 
obtains the decoded branch target instruction, the de- 
coded branch target address, and the address of the 
second instruction from branch target storage unit 114. 

As a result, the apparatus of Ninth Embodiment has 
deleted the branch hazard that occurs in the conven- 
tional apparatus, increasing the speed of the loop 
processing. 

<Tenth Embodlment> 

The data processing apparatus of Tenth Embodi- 
ment achieves high-speed loop processing by register- 
ing the decoded first instruction and the address of the 
second instruction In branch target storage unit 114 and 
by using a loop exclusive branching instruction for spec- 
ifying the first address. 

<Construction> 

The construction of the data processing apparatus 
of the present embodiment is the same as that of First 
Embodiment shown In Flg.4, except that branch target 
storage unit 114 stores the decoded first Instruction and 
the address of the second instruction of a loop. 

<Operation> 

The operation of the present apparatus is the same 
as that of First Embodiment shown in Figs.5 and 6 ex- 
cept that branch target registering instruction "set" and 
loop exclusive branching instruction "Icc" generate op- 
erations different from those of First Embodiment when 
executed. These points are explained below. 

After branch target registering instruction "set" Is 
transferred to decode instruction buffer 109b (DR) and 
branch target registering unit 111 Is activated, the unit 
111 , at the next clock cycle, reads the first address and 
the decoded first Instruction, and registers a value of the 
first address added by 4 and the decoded Instruction 
"add a,b,c" In branch target storage unit 114. 

On the other hand, after loop exclusive branching 
instruction "Icc" Is transferred to decode instruction buff- 
er 109b and branch executing unit 113 is activated, the 
activated branch executing unit 113 transfers the ad- 
dress of the second Instruction registered in unit 114 to 
fetch counter 108a. Then, at the next clock cycle, branch 
executing unit 1 1 3 transfers "0" to decode counter 109a 
(DC) and decode instruction buffer lOgb (DR) to invali- 
date them, then at the next clock cycle, transfers an ad- 
dress specified by instruction "Icc", namely the value for 
"L", from execution controlling unit 11 Ob to execution In- 
struction counter 110a, then transfers the decoded In- 
struction "add a,b,c' registered in unit 114 to execution 
controlling unit 110b. 

Fig. 18 shows an operational flow in the pipeline at 
an execution of loop exclusive branching instruction 
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"Ice". Fig. 18 corresponds to Fig.9 in First Ennbodiment. 
"BR3" and "BR4" are used to respectively indicate the 
address of the second instruction and the decoded first 
instruction registered in branch target storage unit 114. 

In stage IF at clock cycle 8, an instruction stored in 
the external memory at an address stored in IC is trans- 
ferred to I R first, then, the address stored in BR3 is trans- 
ferred to IC. Then, in stage DEC at clock cycle 9, "0" is 
transferred to DC and DR. In stage EX at clock cycle 1 0, 
a part of an Instruction stored in ER is transferred to EC 
first, then, the decoded first instruction is transferred 
from branch target storage unit 114 to ER. In the suc- 
ceeding clock cycles, the program operates the same 
as First Embodiment. 

As apparent from the above description, the present 
apparatus generates a high-speed loop instruction 
when compiling a source program, and registers the de- 
coded first instruction and the second address of a loop 
in branch target storage unit 114 during the execution 
of the first repetition in the loop. After decoding a branch 
instruction designating a loop, processor 107 does not 
need to compute branch target address and the second 
address, nor fetch the branch target instruction from a 
low-speed external memory, and obtains the decoded 
branch target Instruction, the address of the second in- 
struction, and branch target address from branch target 
storage unit 114 and loop exclusive instruction "Ice L". 

As a result, the apparatus of Tenth Embodiment has 
deleted the branch hazard that occurs in the conven- 
tional apparatus, increasing the speed of the loop 
processing. 

<Eleventh Embodlment> 

The data processing apparatus of Eleventh Embod- 
iment achieves high-speed loop processing by register- 
ing the decoded first instruction, and the address of the 
second instruction in branch target storage unit 11 4 and 
by computing the branch target address. 

<Construction> 

The construction of the data processing apparatus 
of the present embodiment is the same as that of First 
Embodiment shown in Fig.4, except that branch target 
storage unit 114 stores the decoded first instruction and 
the address of the second Instruction of a loop. 

<Operation> 

The operation of the present apparatus is the same 
as that of First Embodiment shown in Figs.5 and 6 ex- 
cept that branch target registering instruction "set" and 
loop exclusive branching instruction "Ice" generate op- 
erations different from those of First Embodiment when 
executed. These points are explained below. 

After branch target registering instruction "set" Is 
transferred to decode instruction buffer 109b (DR) and 
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branch target registering unit 111 is activated, the unit 
11 1 , at the next clock cycle, reads the first address and 
the decoded first instruction, and adds 4 to the address 
and registers the result value and the decoded instruc- 
s tion in branch target storage unit 114. 

On the other hand, after loop exclusive branching 
instruction "Ice" is transferred to decode instruction buff- 
er 109b and branch executing unit 113 is activated, the 
activated branch executing unit 11 3 transfers the result 
value registered in unit 11 4 to fetch counter 108a. Then, 
at the next clock cycle, branch executing unit 113 trans- 
fers "0" to decode instruction counter 109a (DC) and de- 
code instruction buffer 109b (DR) to invalidate them, 
then at the next clock cycle, reads the address and the 
decoded instruction "add a,b,c" from branch target stor- 
age unit 114, and transfers a value of the address sub- 
tracted by 4 and the decoded instruction to execution 
instruction counter 110a and execution controlling unit 
110b respectively 

Fig. 19 shows an operational flow In the pipeline at 
an execution of loop exclusive branching instruction 
"Ice". Fig. 19 corresponds to Fig.9 in First Embodiment. 
"BR3" and "BR4" are used to respectively indicate the 
address of the second instruction and the decoded first 
Instruction registered In branch target storage unit 114. 

In stage IF at clock cycle 8, an instruction stored in 
the external memory at an address stored in IC is trans- 
ferred to IR first. Then, in stage DEC at clock cycle 9. 
"0" is transferred to DC and DR. In stage EX at clock 
cycle 10, the decoded Instruction stored in BR4 is trans- 
ferred to ER, and the address stored in BR3 is subtract- 
ed by 4 then transferred to EC. In the succeeding clock 
cycles, the program operates the same as First embod- 
iment. 

As apparent from the above description, the present 

apparatus generates a high-speed loop instruction 
when compiling a source program, and registers the de- 
coded first instruction and the second address of a loop 
in branch target storage unit 114 during the execution 
of the first repetitbn In the loop. After decoding a branch 
instruction designating a loop, processor 107 does not 
need to compute the second address, nor fetch the 
branch target instruction from a low-speed external 
memory, and obtains the decoded branch target instruc- 
tion and the address of the second instruction from 
branch target storage unit 114. 

As a result, the apparatus of Eleventh Embodiment 
has deleted the branch hazard that occurs in the con- 
ventional apparatus, increasing the speed of the loop 
processing. 

The present Inventbn has more variations other 
than these embodiments. For example, : 

(1) Compiler 102 in the above embodiments is de- 
scribed to output any one of high-speed loop instruc- 
tions, "set", "clr", and "Ice". However, the compiler may 
output two kinds of high-speed loop instructions such as 
■setl" and "set2", "clrl " and "clra", and "Iccl " and ■lcc2". 
instead of each of the above instructions. These instruc- 
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tions may be related to locations of branch target infor- 
mation stored in branch target storage unit 114. This re- 
duces time for accessing branch target storage unit 114. 

(2) Branch target storage unit 114 In the above em- 
bodiments comprises LtFO latches. However, the unit 
may comprise another type of latches, such as "ring". 
The ring latches enable the unit to input or output branch 
target information with flexibility and at high speed. 

Also, branch target storage unit 114, having a ca- 
pacity of up to two pairs of pieces of branch target Infor- 
mation in the above embodiments, may have a capacity 
of a pair of pieces of information. In such case, branch 
target registering unit 111 keeps ovenwriting information 
in branch target storage unit 114, and branch target 
clearing unit 112 Is not required anymore. 

Also, when branch target storage unit 114 stores an 
instruction sequence, the total size of the instructions to 
be stored, not the total number of instructions to be 
stored, may be fixed. With such arrangement, the ca- 
pacity of the unit will be used effectively even for varia- 
ble-length instructions. 

(3) Loop detecting unit 103 in the above embodi- 
ments detects up to two nested loops from innermost 
when one or more loops are nested in a loop. However, 
in such case, the unit may detect loops that are executed 
most frequently. 

(4) Instruction fetching unit 108, which stores an in- 
struction In the above embodiments, may store more in- 
structions with First In First Out (FIFO) format. 

(5) Processor 107 In the above embodiments exe- 
cutes high-speed loop instructions "set" and "clr" on 
one-by-one basis. However, the processor may execute 
these Instructions on parallel with other near instruc- 
tions. These instructions can be executed Independent- 
ly because they only access branch target storage unit 
1 14 and do not depend on other instructions. 

In the above embodiments, branch target register- 
ing unit 111, branch target clearing unit 112, and branch 
executing unit 1 1 3 are respectively activated by instruc- 
tions "set", "clr", and "Ice". However, other instructions 
may Include such functions so that high-speed loop in- 
structions are executed on parallel with other instruc- 
tions. (6) In First and Eighth Embodiments, branch ex- 
ecuting unit 113 may use an exclusive incrementing de- 
vice in the unit not shown In Fig.4 when adding 4 to the 
address stored in branch target storage unit 114. 

In Third. Fifth, Seventh, and Tenth Embodiments, 
branch executing unit 113 may use an exclusive arith- 
metic unit in the unit not shown in Fig.4 when computing 
an address that is equal to label "L". 

In Sixth and Eleventh Embodiments, branch exe- 
cuting unit 11 3 may use an exclusive decrementing de- 
vice in the unit not shown in Fig.4 when subtracting 4 
from the address stored in branch target storage unit 
114. 

By allowing branch executing unit 1 1 3 to use these 
exclusive units, the processor will achieve high-speed 
perfomnances. 



Although the present invention has been fully de- 
scribed by way of examples with reference to the ac- 
companying drawings, it is to be noted that various 
changes and modifications will be apparent to those 
s skilled in the art. Therefore, unless such changes and 
modifications depart from the scope of the present in- 
vention, they should be construed as being included 
therein. 



Claims 

1. A compiler for compiling a source program and gen- 
erating a program containing a machine-language 
instruction sequence, comprising: 

A loop detecting means for detecting certain 
loops from the source program and extracting 
information of the detected loops from the 
source program, the extracted information be- 
ing used to specify the-certain loops; and 
A high-speed loop applying means, compris- 
ing: 

A first loop exclusive Instruction generating 

unit for generating a first loop exclusive in- 
struction which indicates a succeeding In- 
struction is an entry of a loop and placing 
the first loop exclusive instruction immedi- 
ately before the entry of the loop in the ma- 
chine-language instruction sequence; and 
A second loop exclusive instruction gener- 
ating unit for generating second loop exclu- 
sive instructions which direct the program 
to branch to the entry of the loop and plac- 
ing the second loop exclusive instructions 
at places from where the program branch- 
es to the entry of the loop, the first loop ex- 
clusive instruction generating unit and the 
second loop exclusive instruction generat- 
ing unit operating based on the information 
extracted by the loop detecting means. 

2. The compiler as defined in Claim 1 wherein the 
high-speed loop applying means further comprises 

a third loop exclusive Instruction generating 
unit for generating a third loop exclusive Instruction 
which indicates that the loop has ended, and plac- 
ing the third loop exclusive instruction immediately 
after an exit of the loop In the machine-language 
instruction sequence, based on the information ex- 
tracted by the loop detecting means. 

3. A processor for executing a program containing a 
machine-language instruction sequence which in- 
cludes certain instructions, namely a first loop ex- 
clusive instruction and a second loop exclusive in- 
struction, comprising: 
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a pipeline, comprising: 

a fetching unit for fetcliing instructions one 
by one from the machine-language instruc* 
tion sequence; s 
a decoding unit for decoding the instruc- 
tions fetched by the fetching unit; and 
an executing unit for executing the instruc- 
tions decoded by the decoding unit; 

10 

a branch target storage means; 
a registering means for, after the decoding unit 
has decoded a first loop exclusive instruction, 
registering branch target information in the 
branch target storage means; and is 
a branch executing means for, after the decod- 
ing unit has decoded a second loop exclusive 
instruction, judging whether to execute a loop, 
if judges to execute, reading the branch target 
information registered in the branch target stor- 20 
age means, and controlling the pipeline so that 
the program executes the loop using the read 
branch target information. 

A processor for executing a program containing a 2S 
machine-language instruction sequence which in- 
cludes certain instructions, namely a first loop ex- 
clusive instruction and a second loop exclusive in- 
struction, comprising: 

30 

a pipeline, comprising: 

a fetching unit for fetching instructions one 
by one from the machine-language instruc- 
tion sequence; 3S 
a decoding unit for decoding the instruc- 
tions fetched by the fetching unit; and 
an executing unit for executing the instruc- 
tions decoded by the decoding unit; 

40 

a branch target storage means; 
a registering means for, after the decoding unit 
has decoded a first loop exclusive instruction, 
registering branch target information in the 
branch target storage means; ^ 
a branch executing means for, after the decod- 
ing unit has decoded a second loop exclusive 
instruction, judging whether to execute a loop, 
if judges to execute, reading the branch target 
information registered in the branch target stor- so 
age means, and controlling the pipeline so that 
the program executes the loop using the read 
branch target infomnaticn; and 
a clearing means for, after the decoding unit 
has decoded a third loop exclusive instruction, ss 
clearing the branch target information regis- 
tered in the branch target storage means. 



5. The processor as defined in Claim 4, wherein 

the registering means, after the decoding unit 
has decoded a first loop exclusive instruction, 
registers an address of an instruction succeed- 
ing to the first loop exclusive instruction in the 
branch target storage means, and wherein 
the branch executing means, if having judged 
to execute a loop, reads the address registered 
in the branch target storage means, and con- 
trols the pipeline so that the fetching unit fetch- 
es instructions starting from the instruction at 
the address. 

6. The processor as defined in Claim 4, wherein 

the registering means, after the decoding unit 
has decoded a first loop exclusive instruction, 
registers an address of an instruction succeed- 
ing to the decoded first loop exclusive instruc- 
tion and a certain number of instructions suc- 
ceeding to the first loop exclusive instruction in 
the branch target storage means, and wherein 
the branch executing means, if having judged 
to execute a loop, reads the address and the 
certain number of instructions registered in the 
branch target storage means, and controls the 
pipeline so that the decoding unit decodes the 
certain number of instructions starting from the 
Instruction at the read address and the fetching 
unit fetches instructions starting from an in- 
struction at an address which is obtained by 
performing a certain computation on an ad- 
dress specified by the second loop exclusive in- 
struction. 

7. The processor as defined in Claim 4, wherein 

the registering means, after the decoding unit 
has decoded a first loop exclusive instruction, 
registers an address of an instruction succeed- 
ing to the first loop exclusive instruction and a 
certain number of instructions succeeding to 
the first loop exclusive instruction in the branch 
target storage means, and wherein 
the branch executing means, if having judged 
to execute a loop, reads the address and the 
certain number of instructions registered In the 
branch target storage means, and controls the 
pipeline so that the decoding unit decodes the 
certain number of instructions starting from the 
instruction at the read address and the fetching 
unit fetches instructions starting from an in- 
struction at an address which is obtained by 
performing a certain computation on the read 
address. 

8. The processor as defined in Claim 4, wherein 
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the registering means, after the decoding unit 
has decoded a first loop exclusive instruction, 
registers a first address of an instruction suc- 
ceeding to the first loop exclusive instruction, a 
certain number of instructions succeeding to s 
the first loop exclusive instruction, and a sec- 
ond address of an instruction to be executed 
Immediately after the certain number of instruc- 
tions In the branch target storage means, and 
wherein io 
the branch executing means, if having judged 
to execute a loop, reads the first address, the 
certain number of instructions, and the second 
address registered in the branch target storage 
means, and controls the pipeline so that the de- 
coding unit decodes the certain number of in- 
structions starting from the instruction at the 
first address and the fetching unit fetches in- 
structions starting from the instruction at the 
second address. 20 

9. The processor as defined in Claim 4, wherein 

the registering means, after the decoding unit 
has decoded a first loop exclusive instruction, 2S 

registers a certain number of instructions suc- 
ceeding to the first loop exclusive instruction 
and an address of an instruction to be executed 
immediately after the certain number of instruc- 
tions In the branch target storage means, and 30 
wherein 

the branch executing means, if having judged 
to execute a loop, reads the certain number of 
instructions and the address registered in the 
branch target storage means, and controls the 3S 
pipeline so that the decoding unit decodes the 
certain number of instructions starting from an 
instruction at an address specified by the sec- 
ond loop exclusive instruction and the fetching 
unit fetches Instructions starting from an in- 40 
struction at the address registered in the branch 
target storage means. 

10. The processor as defined in Claim 4, wherein 

45 

the registering means, after the decoding unit 
has decoded a first loop exclusive Instruction, 
registers a certain number of instructions suc- 
ceeding to the first loop exclusive instruction 
and an address of an instruction to be executed so 
immediately after the certain number of instruc- 
tions In the branch target storage means, and 
wherein 

the branch executing means, if having judged 
to execute a loop, reads the certain number of ss 
Instructions and the address registered in the 
branch target storage means, and controls the 
pipeline so that the decoding unit decodes the 
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certain number of instructions starting from an 
instruction at an address which is obtained by 
performing a certain computation on the read 
address and the fetching unit fetches instruc- 
tions starting from an Instruction at the read ad- 
dress. 

11. The processor as defined in Claim 4. wherein 

the registering means, after the decoding unit 
has decoded a first loop exclusive instruction, 
registers an address of an instruction succeed- 
ing to the first loop exclusive instruction and a 
decoded certain number of instructions suc- 
ceeding to the first loop exclusive Instruction In 
the branch target storage means, and wherein 
the branch executing means, if having judged 
to execute a loop, reads the address and the 
decoded certain number of instructions regis- 
tered in the branch target storage means, and 
controls the pipeline so that the executing unit 
executes the decoded certain number of in- 
structions starting from an instruction at the 
read address and the fetching unit fetches in- 
structions starting from an Instruction at an ad- 
dress obtained by performing a certain compu- 
tation on an address specified by the second 
loop exclusive Instruction. 

12. The processor as defined in Claim 4, wherein 

the registering means, after the decoding unit 
has decoded a first loop exclusive instruction, 
registers an address of an instruction succeed- 
ing to the first loop exclusive instruction and a 
decoded certain number of instructions suc- 
ceeding to the first loop exclusive instruction in 
the branch target storage means, and wherein 
the branch executing means, if having judged 
to execute a loop, reads the address and the 
decoded certain number of Instructions regis- 
tered in the branch target storage means, and 
controls the pipeline so that the executing unit 
executes the decoded certain number of in- 
structions starting from an Instruction at the 
read address and the fetching unit fetches in- 
structions starting from an Instruction at an ad- 
dress obtained by performing a certain compu- 
tation on the read address. 

13. The processor as defined in Claim 4, wherein 

the registering means, after the decoding unit 
has decoded a first loop exclusive instruction, 
registers a first address of an instruction suc- 
ceeding to the first loop exclusive instruction, a 
decoded certain number of Instructions suc- 
ceeding to the first loop exclusive instruction. 



18 



35 



EP 0 742 518 A2 



and a second address of an instruction to be 
executed immediately after the decoded cer- 
tain number of instructions in the branch target 
storage means, and wherein 
the branch executing means, if having judged s 
to execute a loop, reads the first address, the 
decoded certain number of instructions, and 
the second address registered in the branch 
target storage means, and controls the pipeline 
so that the executing unit executes the decoded io 
certain number of instructions starting from the 
instruction at the first address and the fetching 
unit fetches instructions starting from the in- 
struction at the second address. 

75 

14. The processor as defined in Claim 4, wherein 

the registering means, after the decoding unit 
has decoded a first loop exclusive instruction, 
registers a decoded certain number of instruc- ^ 
tions succeeding to the first loop exclusive in- 
struction and an address of an instruction to be 
executed immediately after the decoded cer- 
tain number of instructions in the branch target 
storage means, and wherein 2S 
the branch executing means, if having judged 
to execute a loop, reads the decoded certain 
number of instructions and the address regis- 
tered In the branch target storage means, and 
controls the pipeline so that the executing unit 30 
executes the decoded certain number of in- 
structions starting from an instruction at an ad- 
dress specified by the second loop exclusive in- 
struction and the fetching unit fetches instruc- 
tions starting from an instruction at the address 3S 
registered In the branch target storage means. 

15. The processor as defined In Claim 4, wherein 

the registering means, after the decoding unit 40 
has decoded a first loop exclusive instruction, 
registers a decoded certain number of Instruc- 
tions succeeding to the first loop exclusive in- 
struction and an address of an instruction to be 
executed immediately after the decoded cer- 
tain number of instructions in the branch target 
storage means, and wherein 
the branch executing means, if having judged 
to execute a loop, reads the decoded certain 
number of instructions and the address regis- so 
tered in the branch target storage means, and 
controls the pipeline so that the executing unit 
executes the decoded certain number of in- 
structions starting from the read address and 
the fetching unit fetches instructions starting ss 
from an instruction at an address which is ob- 
tained by performing a certain computation on 
the read address. 
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main () 
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d=a*b ; 
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